Synthesis of sentence-level speech based on measured vocal tract area functions
نویسندگان
چکیده
A parametric model of the vocaf tract is developed based on an inventory of area functions ~quired for one male subject with Magnetic Resonance Imaging (MRI). The model is then used to synthesize a sentence recorded from the same sub ject. AREA FUNCTION MODEL OF THE VOCAL TRACT The question that is addressed in this paper is whether an inventory of vocal tract area functions, which represent only stat ic speech sounds, can be successfully used to synthesize dynamic (i.e. connected) speech recorded from a speaker. This is a form of articulator synthesis in which the effective shape of vocal tract airway is directly manipulated rather than explicitly defining position and movement of individual articulators (e.g. tongue tip and body, velure, lips, jaw, etc.). For this discussion, an area function inventory for one male speaker acquired with Magnetic Resonance Imaging (MRI) (1) will be used. The area function at any instant of time is represented as the combination of a vowel substrate and an imposed consonantal element. Such an approach ww proposed by (2) and further developed by (3), and (4). The vowel substrate V for this study is represented by a principal components analysis (PCA~ of the ten vowel area functions in (1,5) as, v(z) = &cj#j(z) +n(z) (1) j=l where z the distance from the glottis, the dj (z)’s are the principal components (or eigenvectors), the cj’s are the coefficients produced by the PCA that will reconstruct each of the original ten vowels, and Q(z) is the mean area function. Thus, any vowel is represented as a perturbation around the mean area function. It has been found that four of the principal components (i.e. 4 coefficients) can reconstruct each area function to within 3 percent of its original fidelity. For a time-varying area function, the cj’s become time-dependent parameters, N V(z,t) = ~cj(t)dj(z) + Q(x) . (2)
منابع مشابه
Determination of the vocal-tract shape from measured formant frequencies.
We model the vocal tract as a lossless acoustic tube and consider the relationship between the resonant frequencies and the cross-sectional area function. Empirical results show that if the logarithm of the area function is band limited preserving only 2n Fourier components, the lowest n pole and n zero frequencies of the admittance function measured at the lips uniquely determine the area coef...
متن کاملDevelopment of speech synthesis simulation system and study of timing between articulation and vocal fold vibration for consonants /p/, /t/ and /k/
This paper describes development of an articulatory speech synthesis system using a transmission line model. A speech synthesis method that simulates speech production process has a potential to produce human-like speech. However, there are many parameters to be set such as vocal tract area functions and timing between articulatory movement and vocal fold vibration. A simulation system that all...
متن کاملModeling of a speech production system based on MRI measurement of three-dimensional vocal tract shapes during fricative consonant phonation
This study, based on the measurement of three-dimensional (3-D) vocal tract shapes during fricative consonant phonation, presents a realistic modeling of a human speech production system. The 3-D shapes of a vocal tract and a dental crown were measured using Magnetic Resonance Imaging (MRI). A male subject was asked to produce the fricatives /s/ and /6/ while wearing a dental crown plate that c...
متن کاملVisualisation of the vocal tract based on estimation of vocal area functions and formant frequencies
A system for visualisation of the vocal-tract shapes during vowel articulation has been designed and developed. The system generates the vocal tract configuration using a new approach based on extracting both the area functions and the formant frequencies form the acoustic speech signal. Using a linear prediction analysis, the vocal tract area functions and the first three formants are first es...
متن کاملNoise Sources and Area Functions for the Synthesis of Fricative Consonants
In this study, we characterize the noise sources and the critical parts of vocal tract area functions for the synthesis of voiceless fricatives. We derive these characteristics indirectly by fitting synthetic to natural fricative spectra in an interactive procedure. The adjustable parameters are the number, location, type, amplitude, and spectral shape of the noise sources as well as the cross-...
متن کامل